feat: Improve process termination logic in multiprocess manager #2371

abersheeran · 2024-06-25T07:30:25Z

Summary

In our cluster, we accidentally discovered the zombie process. We found the reason. Uvicorn's new process manager will JOIN child processes one by one after sending all exit signals. When the previous child process does not exit for a long time, the subsequent child processes cannot be JOIN.

I noticed that Uvicorn has an inherent shutdown timeout, which would be nice if we could use it with a multiprocessor.

The reason why we don't use terminate&join sequentially is to kill all processes faster, as mentioned in this PR #2010

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

abersheeran · 2024-06-25T07:32:20Z

I will add unit tests later, but I have no experience in designing a process that will hang, so if anyone can help, it would be greatly appreciated.

Kludex · 2024-07-31T21:18:57Z

I will add unit tests later, but I have no experience in designing a process that will hang, so if anyone can help, it would be greatly appreciated.

I'll try to check this over the weekend. Sorry the delay.

For next time, your PRs always have preference @abersheeran , please ping me if I take long.

Kludex · 2024-08-11T07:18:24Z

I'm not sure if we should use the same timeout_graceful_shutdown 🤔

abersheeran · 2024-09-06T02:40:14Z

Do you have any new ideas? We really need a configurable timeout here.

Kludex · 2024-09-27T06:22:43Z

uvicorn/supervisors/multiprocess.py

+    def join(self, join_timeout: float | None = None) -> None:
        logger.info(f"Waiting for child process [{self.process.pid}]")
-        self.process.join()
+        self.process.join(join_timeout)
+        # Timeout, kill the process
+        while self.process.exitcode is None:
+            self.process.kill()
+            self.process.join(1)


Why do we have a join(1) here?

Wait for the kill command to take effect. If it does not take effect within 1 second, send the kill command again.

uvicorn/supervisors/multiprocess.py

Co-authored-by: Marcelo Trylesinski <[email protected]>

abersheeran · 2024-09-27T08:15:44Z

uvicorn/supervisors/multiprocess.py

+        self.process.join(timeout)
+        # Timeout, kill the process
+        while self.process.exitcode is None:
+            self.process.kill()


The reason why CI failed is that this is not covered by the test. But I don't know how to design a process that will be 100% stuck.

It's okay to add the pragma here.

feat: Improve process termination logic in multiprocess manager

9be4c17

abersheeran requested a review from Kludex June 25, 2024 07:32

Kludex self-assigned this Jul 31, 2024

Kludex reviewed Sep 27, 2024

View reviewed changes

abersheeran and others added 3 commits September 27, 2024 15:52

Update uvicorn/supervisors/multiprocess.py

da67e21

Co-authored-by: Marcelo Trylesinski <[email protected]>

Merge branch 'master' into use-timeout-graceful-shutdown

9917da6

fix name changed

4fc41f9

abersheeran commented Sep 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Improve process termination logic in multiprocess manager #2371

feat: Improve process termination logic in multiprocess manager #2371

abersheeran commented Jun 25, 2024

abersheeran commented Jun 25, 2024

Kludex commented Jul 31, 2024

Kludex commented Aug 11, 2024

abersheeran commented Sep 6, 2024

Kludex Sep 27, 2024

abersheeran Sep 27, 2024

abersheeran Sep 27, 2024

Kludex Sep 28, 2024

feat: Improve process termination logic in multiprocess manager #2371

Are you sure you want to change the base?

feat: Improve process termination logic in multiprocess manager #2371

Conversation

abersheeran commented Jun 25, 2024

Summary

Checklist

abersheeran commented Jun 25, 2024

Kludex commented Jul 31, 2024

Kludex commented Aug 11, 2024

abersheeran commented Sep 6, 2024

Kludex Sep 27, 2024

Choose a reason for hiding this comment

abersheeran Sep 27, 2024

Choose a reason for hiding this comment

abersheeran Sep 27, 2024

Choose a reason for hiding this comment

Kludex Sep 28, 2024

Choose a reason for hiding this comment